Distributed GraphLab: A Framework for Machine Learning in the Cloud

نویسندگان

Yucheng Low

Joseph Gonzalez

Aapo Kyrola

Danny Bickson

Carlos Guestrin

Joseph M. Hellerstein

چکیده

While high-level data parallel frameworks, like MapReduce, simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many important data mining and machine learning algorithms and can lead to inefficient learning systems. To help fill this critical void, we introduced the GraphLab abstraction which naturally expresses asynchronous, dynamic, graph-parallel computation while ensuring data consistency and achieving a high degree of parallel performance in the shared-memory setting. In this paper, we extend the GraphLab framework to the substantially more challenging distributed setting while preserving strong data consistency guarantees. We develop graph based extensions to pipelined locking and data versioning to reduce network congestion and mitigate the effect of network latency. We also introduce fault tolerance to the GraphLab abstraction using the classic Chandy-Lamport snapshot algorithm and demonstrate how it can be easily implemented by exploiting the GraphLab abstraction itself. Finally, we evaluate our distributed implementation of the GraphLab abstraction on a large Amazon EC2 deployment and show 1-2 orders of magnitude performance gains over Hadoop-based implementations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud

متن کامل

GraphLab: A Distributed Framework for Machine Learning in the Cloud

Machine Learning (ML) techniques are indispensable in a wide range of fields. Unfortunately, the exponential increase of dataset sizes are rapidly extending the runtime of sequential algorithms and threatening to slow future progress in ML. With the promise of affordable largescale parallel computing, Cloud systems offer a viable platform to resolve the computational challenges in ML. However, ...

متن کامل

GraphLab: A Distributed Abstraction for Large Scale Machine Learning

Machine Learning methods have found increasing applicability and relevance to the real world, finding applications in a broad range of fields in robotics, data mining, physics and biology, among many others. However, with the growth of the World Wide Web, and with improvements in data collection technology, real world datasets have been rapidly increasing in size and complexity, necessitating c...

متن کامل

Optimization Task Scheduling Algorithm in Cloud Computing

Since software systems play an important role in applications more than ever, the security has become one of the most important indicators of softwares.Cloud computing refers to services that run in a distributed network and are accessible through common internet protocols. Presenting a proper scheduling method can lead to efficiency of resources by decreasing response time and costs. This rese...

متن کامل

Machine Learning and Cloud Computing: Survey of Distributed and SaaS Solutions

Applying popular machine learning algorithms to large amounts of data raised new challenges for the ML practitioners. Traditional ML libraries does not support well processing of huge datasets, so that new approaches were needed. Parallelization using modern parallel computing frameworks, such as MapReduce, CUDA, or Dryad gained in popularity and acceptance, resulting in new ML libraries develo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

PVLDB

دوره 5 شماره

صفحات -

تاریخ انتشار 2012

Distributed GraphLab: A Framework for Machine Learning in the Cloud

نویسندگان

چکیده

منابع مشابه

Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud

GraphLab: A Distributed Framework for Machine Learning in the Cloud

GraphLab: A Distributed Abstraction for Large Scale Machine Learning

Optimization Task Scheduling Algorithm in Cloud Computing

Machine Learning and Cloud Computing: Survey of Distributed and SaaS Solutions

عنوان ژورنال:

اشتراک گذاری